{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Using llama-parse with AstraDB"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this notebook, we show a basic RAG-style example that uses `llama-parse` to parse a PDF document, store the corresponding document into a vector store (`AstraDB`) and finally, perform some basic queries against that store. The notebook is modeled after the quick start notebooks and hence is meant as a way of getting started with `llama-parse`, backed by a vector database."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Requirements"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# First, install the required dependencies\n",
    "%pip install --quiet llama-index llama-parse llama-index-vector-stores-astra-db llama-index-llms-openai"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Configuration"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "import openai\n",
    "\n",
    "from getpass import getpass\n",
    "\n",
    "# Get all required API keys and parameters\n",
    "llama_cloud_api_key = getpass(\"Enter your Llama Index Cloud API Key: \")\n",
    "api_endpoint = input(\"Enter your Astra DB API Endpoint: \")\n",
    "token = getpass(\"Enter your Astra DB Token: \")\n",
    "namespace = (\n",
    "    input(\"Enter your Astra DB namespace (optional, must exist on Astra): \") or None\n",
    ")\n",
    "openai_api_key = getpass(\"Enter your OpenAI API Key: \")\n",
    "\n",
    "os.environ[\"LLAMA_CLOUD_API_KEY\"] = llama_cloud_api_key\n",
    "openai.api_key = openai_api_key"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# llama-parse is async-first, running the sync code in a notebook requires the use of nest_asyncio\n",
    "import nest_asyncio\n",
    "\n",
    "nest_asyncio.apply()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Using llama-parse to parse a PDF"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Download complete.\n"
     ]
    }
   ],
   "source": [
    "# Grab a PDF from Arxiv for indexing\n",
    "import requests\n",
    "\n",
    "# The URL of the file you want to download\n",
    "url = \"https://arxiv.org/pdf/1706.03762.pdf\"\n",
    "# The local path where you want to save the file\n",
    "file_path = \"./attention.pdf\"\n",
    "\n",
    "# Perform the HTTP request\n",
    "response = requests.get(url)\n",
    "\n",
    "# Check if the request was successful\n",
    "if response.status_code == 200:\n",
    "    # Open the file in binary write mode and save the content\n",
    "    with open(file_path, \"wb\") as file:\n",
    "        file.write(response.content)\n",
    "    print(\"Download complete.\")\n",
    "else:\n",
    "    print(\"Error downloading the file.\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Started parsing the file under job_id ce3909a7-54cf-438b-849a-fe9a903b0c71\n"
     ]
    }
   ],
   "source": [
    "from llama_parse import LlamaParse\n",
    "\n",
    "documents = LlamaParse(result_type=\"text\").load_data(file_path)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'rmer - model architecture.\\nThe Transformer follows this overall architecture using stacked self-attention and point-wise, fully\\nconnected layers for both the encoder and decoder, shown in the left and right halves of Figure 1,\\nrespectively.\\n3.1   Encoder and Decoder Stacks\\nEncoder:     The encoder is composed of a stack of N = 6 identical layers. Each layer has two\\nsub-layers. The first is a multi-head self-attention mechanism, and the second is a simple, position-\\nwise fully connected feed-forward network. We employ a residual connection [11] around each of\\nthe two sub-layers, followed by layer normalization [1]. That is, the output of each sub-layer is\\nLayerNorm(x + Sublayer(x)), where Sublayer(x) is the function implemented by the sub-layer\\nitself. To facilitate these residual connections, all sub-layers in the model, as well as the embedding\\nlayers, produce outputs of dimension dmodel = 512.\\nDecoder:    The decoder is also composed of a stack of N = 6 identical layers. In addition '"
      ]
     },
     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Take a quick look at some of the parsed text from the document:\n",
    "documents[0].get_content()[10000:11000]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Storing into Astra DB"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.vector_stores.astra_db import AstraDBVectorStore\n",
    "\n",
    "astra_db_store = AstraDBVectorStore(\n",
    "    token=token,\n",
    "    api_endpoint=api_endpoint,\n",
    "    namespace=namespace,\n",
    "    collection_name=\"astra_v_table_llamaparse\",\n",
    "    embedding_dimension=1536,\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.core.node_parser import SimpleNodeParser\n",
    "\n",
    "node_parser = SimpleNodeParser()\n",
    "\n",
    "nodes = node_parser.get_nodes_from_documents(documents)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.embeddings.openai import OpenAIEmbedding\n",
    "from llama_index.core import VectorStoreIndex, StorageContext\n",
    "\n",
    "storage_context = StorageContext.from_defaults(vector_store=astra_db_store)\n",
    "\n",
    "index = VectorStoreIndex(\n",
    "    nodes=nodes,\n",
    "    storage_context=storage_context,\n",
    "    embed_model=OpenAIEmbedding(api_key=openai_api_key),\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Simple RAG Example"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "query_engine = index.as_query_engine(similarity_top_k=15)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "***********New LlamaParse+ Basic Query Engine***********\n",
      "Multi-Head Attention is also known as multi-headed self-attention.\n"
     ]
    }
   ],
   "source": [
    "query = \"What is Multi-Head Attention also known as?\"\n",
    "\n",
    "response_1 = query_engine.query(query)\n",
    "print(\"\\n***********New LlamaParse+ Basic Query Engine***********\")\n",
    "print(response_1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'We used beam search as described in the previous section, but no\\ncheckpoint averaging. We present these results in Table 3.\\nIn Table 3 rows (A), we vary the number of attention heads and the attention key and value dimensions,\\nkeeping the amount of computation constant, as described in Section 3.2.2. While single-head\\nattention is 0.9 BLEU worse than the best setting, quality also drops off with too many heads.\\nIn Table 3 rows (B), we observe that reducing the attention key size dk hurts model quality. This\\nsuggests that determining compatibility is not easy and that a more sophisticated compatibility\\nfunction than dot product may be beneficial. We further observe in rows (C) and (D) that, as expected,\\nbigger models are better, and dropout is very helpful in avoiding over-fitting. In row (E) we replace our\\nsinusoidal positional encoding with learned positional embeddings [9], and observe nearly identical\\nresults to the base model.\\n6.3    English Constituency Parsing\\nTo evaluate if the Transformer can generalize to other tasks we performed experiments on English\\nconstituency parsing. This task presents specific challenges: the output is subject to strong structural\\nconstraints and is significantly longer than the input. Furthermore, RNN sequence-to-sequence\\nmodels have not been able to attain state-of-the-art results in small-data regimes [37].\\nWe trained a 4-layer transformer with dmodel = 1024 on the Wall Street Journal (WSJ) portion of the\\nPenn Treebank [25], about 40K training sentences. We also trained it in a semi-supervised setting,\\nusing the larger high-confidence and BerkleyParser corpora from with approximately 17M sentences\\n[37]. We used a vocabulary of 16K tokens for the WSJ only setting and a vocabulary of 32K tokens\\nfor the semi-supervised setting.\\nWe performed only a small number of experiments to select the dropout, both attention and residual\\n(section 5.4), learning rates and beam size on the Section 22 development set, all other parameters\\nremained unchanged from the English-to-German base translation model. During inference, we\\n                                                             9\\n---\\nTable 4: The Transformer generalizes well to English constituency parsing (Results are on Section 23\\nof WSJ)\\n                          Parser                           Training             WSJ 23 F1\\n            Vinyals & Kaiser el al. (2014) [37]    WSJ only, discriminative         88.3\\n                 Petrov et al. (2006) [29]         WSJ only, discriminative         90.4\\n                   Zhu et al. (2013) [40]          WSJ only, discriminative         90.4\\n                   Dyer et al. (2016) [8]          WSJ only, discriminative         91.7\\n                  Transformer (4 layers)           WSJ only, discriminative         91.3\\n                   Zhu et al. (2013) [40]              semi-supervised              91.3\\n               Huang & Harper (2009) [14]              semi-supervised              91.3\\n               McClosky et al. (2006) [26]             semi-supervised              92.1\\n            Vinyals & Kaiser el al. (2014) [37]        semi-supervised              92.1\\n                  Transformer (4 layers)               semi-supervised              92.7\\n                 Luong et al. (2015) [23]                  multi-task               93.0\\n                   Dyer et al. (2016) [8]                  generative               93.3\\nincreased the maximum output length to input length + 300. We used a beam size of 21 and α = 0.3\\nfor both WSJ only and the semi-supervised setting.\\nOur results in Table 4 show that despite the lack of task-specific tuning our model performs sur-\\nprisingly well, yielding better results than all previously reported models with the exception of the\\nRecurrent Neural Network Grammar [8].\\nIn contrast to RNN sequence-to-sequence models [37], the Transformer outperforms the Berkeley-\\nParser [29] even when training only on the WSJ training set of 40K sentences.\\n7    Conclusion\\nIn this work, we presented the Transformer, the first sequence transduction model based entirely on\\nattention, replacing the recurrent layers most commonly used in encoder-decoder architectures with\\nmulti-headed self-attention.\\nFor translation tasks, the Transformer can be trained significantly faster than architectures based\\non recurrent or convolutional layers.'"
      ]
     },
     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Take a look at one of the source nodes from the response\n",
    "response_1.source_nodes[0].get_content()"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
